Back

The Plant Genome

Wiley

Preprints posted in the last 90 days, ranked by how well they match The Plant Genome's content profile, based on 53 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
Dissecting genetic variance structure and evaluating genomic prediction models for single-cross hybrids derived from Stiff Stalk and Non-Stiff Stalk maize heterotic groups

Godoy, J. C.; Edwards, J.; Lee, E. C.; Mikel, M. A.; Fernandes, S. B.; Hirsch, C. N.; Berry, S. P.; Lipka, A. E.; Bohn, M. O.

2026-03-13 genetics 10.64898/2026.03.11.710575 medRxiv
Top 0.1%
25.9%
Show abstract

The early 20th-century discovery of heterosis and the establishment of heterotic groups transformed maize (Zea mays L.) into a keystone of global agriculture. However, maize breeding faces two significant challenges: the gradual decline of general combining ability (GCA) variance within heterotic groups and the impracticality of testing all possible single crosses in the early stages of a breeding program. Here, we developed genomic best linear unbiased prediction (GBLUP)-based multi-kernel models, using additive and two alternative non-additive genomic relationship matrices, to estimate the variance components associated with the GCA of Stiff Stalk (SS) and Non-Stiff Stalk (NSS) heterotic groups and the specific combining ability (SCA) arising from their crosses. We further applied these models to predict the performance of untested single-cross combinations under varying levels of parental information. We showed that the SS and NSS groups retained significant GCA variance across traits in both early- and late-maturity groups. The SS group, in contrast, exhibited no detectable GCA variance in grain yield for the intermediate-flowering subset of hybrids, highlighting a limitation for future genetic improvement. Furthermore, our results showed that GBLUP-based multi-kernel models effectively identified superior hybrids when parental information was available. In the absence of this information, however, these models underperformed compared to covariance-based approaches. Both types of non-additive matrices produced similar results, affirming the robustness of the inferred genetic architecture. Overall, this study sheds light on the future use of US maize commercial germplasm and demonstrates how GBLUP-based multi-kernel models can improve the efficiency of hybrid breeding programs.

2
Genomic selection for seed yield enhances flax breeding efficiency

You, F. M.; Zheng, C.; Zagariah Daniel, J. J.; Li, P.; Jackle, K.; House, M.; Tar'an, B.; Cloutier, S.

2026-03-03 genomics 10.64898/2026.03.01.707406 medRxiv
Top 0.1%
22.7%
Show abstract

Genomic selection (GS) is a promising strategy to improve breeding efficiency for complex traits such as seed yield by enabling early selection and reducing reliance on extensive field testing. However, practical deployment of GS remains challenging due to limited training populations sizes and reduced prediction accuracies when models are applied to true breeding germplasm. In this study, we evaluated GS for flax (Linum usitatissimum L.) seed yield under realistic breeding scenarios, with a focus on across-population prediction (APP) and breeding decision support rather than model benchmarking. Using historical germplasm collections and a newly developed breeding-oriented population as training sets, GS performance was assessed across multiple independent test populations representing contemporary breeding lines evaluated in replicated yield trials. APP accuracies reached r = 0.84 when training and test populations were genetically aligned, supporting routine breeding deployment. Training population composition emerged as a key determinant of prediction success, with breeding-oriented populations consistently outperforming broad germplasm collections for predicting true breeding lines. Check-based selection analyses showed that GS reliably reproduced phenotypic advancement decisions while eliminating 61-91% of low-performing lines, resulting in 48-78% reduction in field evaluation costs for a typical cohort of 300 lines. Marker subsampling analyses further indicated that moderate-density genotyping-by-sequencing panels ([~]2,500-3,000 SNPs) are sufficient to achieve stable prediction accuracies. Overall, these results demonstrate that GS for seed yield in flax is ready for routine integration into breeding programs, offering a practical pathway to reduce costs, accelerate breeding cycles, and enhance selection efficiency.

3
Predicting agronomic performance of maize landraces in various and future environments by combining genomic prediction and ecogenetics

Galaretto, A. O.; Pegard, M.; Malvar, R. A.; Moreau, L.; Butron, A.; Revilla, P.; Madur, D.; Combes, V.; Balconi, C.; Bauland, C.; Mendes-Moreira, P.; Sarcevic, H.; Barata, A. M.; Murariu, D.; Schierscher-Viret, B.; Stringens, A.; Andjelkovic, V.; Goritschnig, S.; Gouesnard, B.; Charcosset, A.; Nicolas, S. D.

2026-01-29 genomics 10.64898/2026.01.28.699851 medRxiv
Top 0.1%
18.3%
Show abstract

Maize traditional populations (landraces) hold valuable genetic diversity for addressing climate change and low-input agriculture but remain underutilized due to lack of evaluations. High-throughput pool genotyping (HPG) has been previously used to characterize diversity but its potential for implementing genomic prediction (GP) and genomic offset (GO) for maize landraces has not been tested yet. We developed HPG-based GP models, combined or not with GO and within-population gene diversity (Hs), calibrated using 397 European landraces evaluated across environments and use cases. GP alone showed high predictive ability for yield (0.75), plant height (0.92) and male flowering time (0.94). Including Hs and GO in the GP model improved by 13% the predictive abilities for grain yield of new landraces in new environments. Our model provided phenotypic adaptive landscapes for each landrace in future climatic scenarios and predicted that agronomic performance stability increases with Hs. Combining GP with eco-genetic predictions made it possible to identify promising landraces to improve adaptation to future or new cultivation conditions. TeaserIdentify promising landrace adapted to new and future environments by combining genomic selection and offset

4
Multi-trait Multi-environment Genomic Prediction Strategies for Miscanthus sacchariflorus Populations

Proma, S.; Garcia-Abadillo, J.; Sagae, V. S.; Sacks, E.; Leakey, A. D. B.; Zhao, H.; Ghimire, B. K.; Lipka, A. E.; Njuguna, J. N.; Yu, C. Y.; Seong, E. S.; Yoo, J. H.; Nagano, H.; Anzoua, K. G.; Yamada, T.; Chebukin, P.; Jin, X.; Clark, L. V.; Petersen, K. K.; Peng, J.; Sabitov, A.; Dzyubenko, E.; Dzyubenko, N.; Glowacka, K.; Nascimento, M.; Campana Nascimento, A. C.; Dwiyanti, M. S.; Bagment, L.; Shaik, A.; Jarquin, D.

2026-03-23 genomics 10.64898/2026.03.18.712730 medRxiv
Top 0.1%
14.1%
Show abstract

Genomic selection holds the potential to serve as a strategic tool to enhance the genetic gain of complex traits in Miscanthus breeding programs. The development of improved cultivars requires their assessment for various traits across diverse environments to ensure suitable overall performance. Hence, the multi-trait multi-environment (MTME) genomic prediction (GP) models offer an opportunity to improve selection accuracy. This study aims to evaluate the potential of five GP models: (1) three MTME models including genotype-by-trait-by-environment interaction (GxExT) and (2) two single-trait multi-environment (STME) models (with and without GxE interaction). A Miscanthus sacchariflorus population comprising 336 genotypes evaluated in three environments and scored for four traits (biomass yield YDY, total culm number TCM, average internode length AIL, and culm node number CNN) was analyzed. The predictive ability of the models was evaluated considering three cross-validation schemes resembling realistic scenarios (CV1: predicting new genotypes, CVP: predicting missing traits in a given environment, and CV2: predicting partially observed genotypes). On average, in all cross-validation schemes compared to the STME the predictive ability of the MTME models was 10% to 70% higher for TCM and AIL. On the other hand, for YDY and CNN, both STME models performed similarly or slightly better (between 5 to 64%) than the MTME models in most environments. While the MTME models were not successful for all traits when compared to their STME counterparts, MTME models improved the prediction of the performance of genotypes that were untested across environments or lacked trait information in a specific environment. Overall, our study suggests that MTME GP models can be implemented in Miscanthus breeding programs to improve the predictive ability of the complex traits, shorten breeding cycles, and accelerate selection decisions.

5
Improved Ensemble Performance by Weight Optimisation for the Genomic Prediction of Maize Flowering Time Traits

Tomura, S.; Powell, O. M.; Wilkinson, M. J.; Lefevre, J.; Cooper, M.

2026-02-06 bioinformatics 10.64898/2026.02.03.703660 medRxiv
Top 0.1%
12.6%
Show abstract

Ensembles of multiple genomic prediction models have demonstrated improved prediction performance over the individual models contributing to the ensemble. The outperformance of ensemble models is expected from the Diversity Prediction Theorem, which states that for ensembles constructed with diverse prediction models, the ensemble prediction error becomes lower than the mean prediction error of the individual models. While a naive ensemble-average model provides baseline performance improvement by aggregating all individual prediction models with equal weights, optimising weights for each individual model could further enhance ensemble prediction performance. The weights can be optimised based on their level of informativeness regarding prediction error and diversity. Here, we evaluated weighted ensemble-average models with three possible weight optimisation approaches (linear transformation, Nelder-Mead and Bayesian) using flowering time traits from two maize nested associated mapping (NAM) datasets; TeoNAM and MaizeNAM. The three proposed weighted ensemble-average approaches improved prediction performance in several of the prediction scenarios investigated. In particular, the weighted ensemble models enhanced prediction performance when the adjusted weights differed substantially from the equal weights used by the naive ensemble models. For performance comparisons within the weighted ensembles, there was no clear superiority among the proposed approaches in both prediction accuracy and error across the prediction scenarios. Weight optimisation in ensembles warrants further investigation to explore the opportunities to improve their prediction performance; for example, integration of a weighted ensemble with a simultaneous hyperparameter tuning process may offer a promising direction for further research.

6
Double Reduction in Allotetraploid Peanut and the Role of Chromosomal Imbalance in Unexpected Linkage Map Artifacts

Lamon, S.; Bourke, P. M.; Abernathy, B. L.; dos Santos, J. F.; de Godoy, I. J.; Leal-Bertioli, S. C. M.; Bertioli, D. J.

2026-02-14 genetics 10.64898/2026.02.12.704920 medRxiv
Top 0.1%
12.6%
Show abstract

Polyploidization in peanut (Arachis hypogaea L.) provided evolutionary advantages by increasing heterosis, the response to selection, and enhancing adaptability. However, it also caused a genetic bottleneck by isolating cultivated peanut from its wild diploid relatives. Mechanisms such as homoeologous exchange can partially restore genetic diversity by generating new allelic combinations. Double reduction is a rare segregation pattern restricted to polyploids, in which a single-dosage locus yields duplex gametes. It requires multivalent formation and crossing over between non-sister chromatids, both of which are associated with homoeologous exchange. Although peanut mainly exhibits disomic pairing, occasional multivalents theoretically allow low-frequency double reduction. To estimate double reduction and examine its relationship with genetic instability, a high-density phased linkage map was constructed using a backcross population from a cross between a neoallotetraploid [A. magna K 30097 x A. stenosperma V 15076]4x (MagSten) and cultivated peanut. The final map included 9,717 SNP markers with an average spacing of 0.22 centiMorgans. Some progenies showed unbalanced genomic compositions, creating artifacts in linkage analysis. Removing these progenies improved the map and suggested a common origin for artifacts previously observed in other linkage maps, revealing a novel aspect of mapping in allotetraploid peanut. Analysis of the phased map revealed double reduction in 12% of progenies. Notably, one event produced a genomic composition consistent with theoretical predictions, supporting the expectation that double reduction causes unbalanced genomes in allopolyploids. These results indicate that double reduction is a low but significant frequency genetic phenomenon in the segmental allotetraploid peanut, contributing to the genetic instability and evolutionary dynamics of this and likely other allopolyploid genomes. Article SummaryThis study investigated double reduction, a rare genetic event in segmental allopolyploid peanut, which can create unbalanced genomic compositions and affect genetic diversity. We generated a backcross population using neoallotetraploid and cultivated peanuts, then constructed a high-density phased linkage map. Analysis revealed unbalanced genomic compositions in some progenies caused by homoeologous exchanges, which reduced map quality. Double reduction was estimated to occur in approximately 12% of progenies, aligning with theoretical expectations for genomic imbalance. These results demonstrate that double reduction contributes to genetic instability, inheritance patterns, and genome evolution in allopolyploid organisms such as peanut.

7
Landrace and bred accessions of allotetraploid sour cherry (Prunus cerasus L.) reveal variation in subgenome dosage and subgenome expression bias

Rhoades, K. E. B.; Goeckeritz, C. Z.; Bird, K. A.; Yocca, A. E.; Edger, P. P.; Iezzoni, A.

2026-02-20 genetics 10.64898/2026.02.19.706907 medRxiv
Top 0.1%
12.2%
Show abstract

Subgenome dominance is a phenomenon observed in many allopolyploids where one parental genome exhibits stronger influence over phenotype than the other parental genomes. This may present as preferential retention of one subgenome through fractionation, replacement via homoeologous exchange, or as subgenome expression bias, where one subgenome is expressed at a higher abundance compared to other subgenomes. Sour cherry (Prunus cerasus) is an allotetraploid fruit tree species resulting from an interspecific cross between extant relatives of ground cherry (P. fruticosa) and sweet cherry (P. avium). Prior comparative genomic analyses suggest that the sour cherry cultivar Montmorency contains three subgenomes. Subgenomes A and A, each present in one copy, are derived from a P. fruticosa-like ancestor, and B, present in two copies, is derived from a P. avium-like ancestor. In this study we investigated the subgenome dynamics of the three subgenomes of sour cherry in four diverse landraces and two cultivars, including Montmorency. We found evidence of 26 homoeologous exchange events and five whole-homoeolog replacements relative to Montmorency in three of the six accessions. We also detected subgenome expression bias favoring the A and A subgenomes over the B subgenome, the magnitude of which differs between accessions and changes over the course of fruit development. Lastly, we show differences in dosage variation and expression bias of four previously-described genes in Montmorency associated with fruit softening, a key trait in this crop. These findings on subgenome dominance offer valuable insights into how this phenomenon may influence traits important for sour cherry breeding.

8
From Diversity to Discovery: Genome-Wide Insights into the Genetic Landscape of Tropical Maize DH Lines

Oli, A.; Benor, S.; Haile, G.; Tadesse, B.; Beyene, Y.; Amudu, M. K.; Gowda, M.

2026-02-05 genetics 10.64898/2026.02.03.703488 medRxiv
Top 0.1%
9.8%
Show abstract

Understanding the extent and structure of genetic diversity within breeding populations is essential for sustaining long-term genetic gain in maize improvement programs. In this study, a panel of 2,555 maize doubled haploid (DH) lines representing diverse genetic backgrounds was genotyped using 3305 high-quality single nucleotide polymorphism (SNP) markers to assess genome-wide diversity, population structure, and relatedness. The SNPs were distributed across all ten chromosomes, with varying marker densities among genomic regions. Diversity indices revealed moderate polymorphism, with mean gene diversity (0.38) and polymorphic information content (0.30), while the minor allele frequency ranged from 0.04 to 0.50. The low observed heterozygosity (0.04) and high fixation index (0.89) confirmed the expected homozygosity of DH lines. Population structure analysis using sparse non-negative matrix factorization (sNMF) and principal coordinate analysis (PCoA) consistently identified two major genetic clusters corresponding to the established heterotic groups used in CIMMYTs tropical maize breeding pipelines. The Analysis of Molecular Variance (AMOVA) indicated that 36% of genetic variation occurred among populations, 58% among individuals within populations, and 6% within individuals (P = 0.001), confirming significant population differentiation and high within-group diversity. These results demonstrate that the DH panel represents a genetically diverse and well-structured population with limited relatedness among lines. The distinct clustering by heterotic group, coupled with substantial within-group variation, provides a strong foundation for genome-wide association studies, genomic selection, and allele mining for complex adaptive traits. The panels diversity and structure make it an invaluable genomic resource for dissecting trait architecture and accelerating genetic gain in tropical maize breeding programs targeting sub-Saharan Africa and similar environments.

9
Natural variation in rice mitogen-activated protein kinase 4 contributes to increased photosynthetic rate under field conditions

Ueda, T.; Adachi, S.; Sugimoto, K.; Maeda, M. H.; Yamanouchi, U.; Mizobuchi, R.; Taniguchi, Y.; Hirasawa, T.; Yamamoto, T.; Tanaka, J.

2026-03-09 plant biology 10.64898/2026.03.06.710232 medRxiv
Top 0.1%
9.7%
Show abstract

Improving rice (Oryza sativa L.) yield requires a balanced enhancement of both sink size and source capacity. While many QTLs for sink size have been identified, only a few are known for source capacity, which is essential for achieving high yield. Here we identified qHP10 as a major QTL for increased photosynthetic rate by using chromosome segment substitution lines derived from a cross between the high-yielding indica cultivar Takanari and the average-yielding japonica cultivar Koshihikari. High-resolution mapping combined with CRISPR/Cas9-induced mutagenesis revealed that the causative gene underlying qHP10 is Mitogen-Activated Protein Kinase 4 (OsMPK4). A near-isogenic line carrying the OsMPK4Takanari allele (NIL-OsMPK4) had a 15-25% higher photosynthetic rate than Koshihikari. NIL-OsMPK4 also had higher stomatal conductance than Koshihikari but similar stomatal pore size and density, indicating that increased stomatal aperture increases photosynthetic rate. This enhancement is likely attributable to the down-regulation of OsMPK4 expression, which increases stomatal conductance and thus promotes CO2 uptake. Our findings demonstrate that OsMPK4 is a promising genetic target for increasing source capacity and, potentially, rice yield through molecular breeding. (175 words)

10
Development and evaluation of a cost-effective, mid-density SNP array as a sorghum community genotyping resource

Kumar, V.; Klein, R. R.; Kaufman, B.; Winans, N. D.; Crozier, D.; Rooney, W. L.; Harrison, M.; Hayes, C.; Tello-Ruiz, M. K.; Gladman, N. P.; Olson, C.; Burow, G.; Sexton-Bowser, S.; Punnuri, S.; Knoll, J.; Dahlberg, J.; Ware, D.

2026-02-23 plant biology 10.64898/2026.02.20.706663 medRxiv
Top 0.1%
9.7%
Show abstract

The development of accessible and cost-effective genotyping platforms is essential to accelerate genetic gain in crop improvement. To address the U.S. sorghum communitys need for a standardized, mid-density genotyping resource, we developed and validated a targeted single-nucleotide polymorphism (SNP) array using the PlexSeq next-generation sequencing (NGS) platform. The resulting genotyping array includes 2,421 SNPs spanning all ten Sorghum bicolor chromosomes and integrates trait-linked and quality control markers selected by public and private stakeholders. Genotyping 2,726 diverse accessions, including the Sorghum Association Panel (SAP), demonstrated high call rates (>90% for most samples and markers), low missing data, and accurate resolution of population structure consistent with prior whole-genome studies. In comparative genomic prediction analyses, the mid-density array performed equivalently to high-density genotype-by-sequencing (GBS) platforms for key traits such as grain yield and plant height across multi-environment trials. Designed for broad utility in breeding pipelines, the array enables marker-assisted selection, genomic prediction, identity verification, and germplasm quality control. Moreover, its adoption by the USDA National Plant Germplasm System facilitates the curation of genebanks and the management of core collections. This community-driven genotyping platform offers a scalable, reproducible, and customizable tool to support molecular breeding in sorghum and underscores the value of targeted marker systems in resource-optimized crop improvement programs.

11
Barley (Hordeum vulgare L.) HvDEP1 alleles and their effect on agronomic and physical grain traits

Vu, H. M.; Coram, T. E.; Able, J. A.; Walter, J.; Coventry, S. J.; Tucker, M. R.

2026-01-30 genetics 10.64898/2026.01.27.702178 medRxiv
Top 0.1%
9.4%
Show abstract

The Dense and erect particle 1 (HvDEP1) gene, located on chromosome 5H in barley (Hordeum vulgare L.), encodes a heterotrimeric G-protein {gamma}-subunit that regulates grain size and stem elongation. Multiple alleles of HvDEP1 have been identified, including the widely utilized semi-dwarf allele HvDEP1.GP, caused by an insertion mutation, and a recently discovered variant, HvDEP1.V, characterized by two deletions in the putative cis-regulatory region. In this study, we evaluated the phenotypic effects of HvDEP1.V relative to HvDEP1.GP and the wild-type allele (HvDEP1.WT) using two BC{square}F{square} populations across multi-environment field trials spanning two locations and three years. HvDEP1.V was associated with plants that were 5-14.6 cm taller, had 3-6.7 higher lodging score, and increased head loss compared to HvDEP1.GP. HvDEP1.V showed comparable agronomic attributes to HvDEP1.WT. Substituting HvDEP1.V for HvDEP1.GP significantly increased all physical grain attributes, including grain width (1.44-4.24% in three out of five environments), grain length (4.88-8.69 %), grain area (6.45-11.06%) and thousand-grain weight (6.75-13.8%). Out of five environments, compared to HvDEP1.WT, HvDEP1.V was associated with wider grain in three environments, shorter grain in four environments, and increased grain roundness in four environments. These findings link allelic variation of the HvDEP1 gene to key agronomic and physical grain traits and demonstrate the functional consequences of HvDEP1.V in diverse genetic backgrounds and field conditions, providing valuable insights for barley improvement. Key messageBy evaluating agronomic performance and physical grain traits in two genetically distinct barley populations across multiple environments, we reveal strong environment- and background-dependent effects of HvDEP1 alleles.

12
Optimizing resource allocation in Miscanthus breeding with sparse testing designs for genomic prediction

Proma, S.; Lubanga, N.; Sacks, E.; Leakey, A. D. B.; Zhao, H.; Ghimire, B. K.; Lipka, A. E.; Njuguna, J. N.; Yu, C. Y.; Seong, E. S.; Yoo, J. H.; Nagano, H.; Anzoua, K. G.; Yamada, T.; Chebukin, P.; Jin, X.; Clark, L. V.; Petersen, K. K.; Peng, J.; Sabitov, A.; Dzyubenko, E.; Dzyubenko, N.; Glowacka, K.; Nascimento, M.; Campana Nascimento, A. C.; Dwiyanti, M. S.; Bagment, L.; Shaik, A.; Garcia-Abadillo, J.; Jarquin, D.

2026-03-23 genomics 10.64898/2026.03.18.712722 medRxiv
Top 0.1%
8.7%
Show abstract

Phenotyping high-biomass perennial crops is laborious and the rate of genetic gain in perennial crop breeding programs is typically low. So, it is especially important to identify methods that produce efficiency gains in the breeding process. Miscanthus is a C4 perennial grass with favorable characteristics for producing biomass as a feedstock for biofuels and diverse biobased products. Increasing biomass yield will increase profitability and environmental benefits, so is a key target for Miscanthus breeding. In addition, the identification of well-adapted genotypes across a wide range of environmental conditions requires the establishment of multi-environment trials (METs). Sparse testing is a genomic prediction-based strategy that reduces the phenotyping costs in METs by selecting a subset of genotypes to evaluate in a subset of environments and then predicts the performance of the unobserved genotype-environment combinations. A Miscanthus sacchariflorus (MSA) population comprising 336 genotypes observed across three environments was analyzed. Three prediction models considering main effects (environments, genotypes, genomic) and interaction effects (genotype-by-environment; GxE interaction) were implemented for forecasting dry biomass yield (YDY), total culm (TCM), average internode length (AIL), and culm node number (CNN). Multiple calibration sets based on different compositions and sizes were considered to evaluate performance in terms of the predictive ability (PA) and the mean square error (MSE) for a fixed testing set size. The training set size ranged from 52 to 112 to predict a fixed set of 224 unobserved genotypes across all three environments. The results showed that the model accounting for GxE interaction presented the highest PA and the lowest MSE for CNN (PA: [~]0.77, MSE: [~]0.5) and YDY (PA: [~]0.70, MSE: [~]1.3) while for TCM and AIL these ranged from [~]0.28 to 0.41 and [~]1.3 to 4.3, respectively. Overall, varying training sets and allocation strategies did not affect PA and MSE, with 52 non-overlapping and 0 overlapping genotypes per environment as the optimal cost-effective allocation framework. This suggests that implementing sparse testing designs could significantly reduce phenotyping costs by fivefold, without compromising PA in breeding programs for perennial crops such as Miscanthus.

13
MGIDI selection and machine learning reveal harvest index driving traits in sodium azide-induced rice mutants with SSR-based genetic diversity

Al Mamun, S. M. A.; Rezve, M.; Sorker, M. B. A.; Shoun, M. M. H.; Sultana, M. S.; Pandit, A. A.; Ray, J.; Islam, M. M.

2026-02-18 plant biology 10.64898/2026.02.17.706299 medRxiv
Top 0.1%
8.3%
Show abstract

Sodium azide mutagenesis offers a powerful approach to generate genetic diversity for rice improvement, yet comprehensive characterization of mutant populations using integrated modern breeding tools remains limited. M mutants of BRRI dhan28 induced with sodium azide, were evaluated for 17 agronomic traits and genetic diversity was characterized using 30 SSR markers. The MGIDI was used to characterize elite genotypes and machine learning approaches were used to dissect trait architecture underlying harvest index. The phenotypic variation captured by principal component analysis was 52.12%, and yield was the trait with the highest genotypic variance (278.22) and genotypic coefficient of variation (29.07%). MGIDI analysis detected 10 elite mutants that significantly outperformed within the same environment in combined yield and harvest index. The main predictors of harvest index variability were examined using a Random Forest analysis, and this showed that grain and straw yield were the main predictors of harvest index variability. The SSR markers showed high level of genetic diversity (PIC = 0.264), population structure analysis revealed two subgroups (Fst = 0.0437) and the pairwise genetic distance ranged from 0.000 to 0.733. Procrustean alignment showed a high correlation between molecular and phenotypic variation. An integrated approach of MGIDI selection and prediction of diversity using machine learning underpinned the identification of elite mutants that can be quickly forwarded to breeding programs. This study provides valuable genetic resources and demonstrates that sodium azide mutagenesis combined with modern analytical tools accelerates genetic gains in rice improvement.

14
Introgression from the wild relative Manihot glaziovii on cassava (M. esculenta) chromosome 1 exhibits segregation distortion and no direct effect on dry matter

Villwock, S. S. C.; Rabbi, I. Y.; Ikpan, A. S.; Ogunpaimo, K.; Nafiu, K.; Kayondo, S. I.; Wolfe, M.; Jannink, J.-L.

2026-02-21 genetics 10.64898/2026.02.20.707074 medRxiv
Top 0.1%
8.2%
Show abstract

The cassava (Manihot esculenta) genome has two large introgressions from its wild relative M. glaziovii on chromosomes 1 and 4 that originate from historical hybridization efforts. The 10 Mbp chromosome 1 introgression has been increasing in frequency in African breeding populations due to its statistical association with higher dry matter content and root number. However, the region also exhibits suppressed recombination, hindering breeders ability to combine favorable glaziovii alleles with the cultivated esculenta background. Since homozygous introgressed lines are rarely selected for advanced trials, dominance effects have not been well-characterized. To analyze the effects of the introgression with higher resolution, we generated a population of over 5000 seedlings from crosses between heterozygous introgressed parents and screened for recombinants using ten KASP markers tagging glaziovii-specific alleles. An optimized subset of 453 lines was then selected and evaluated over two years for yield and vigor traits. Unlike previous studies, composite interval mapping and mixed linear models showed no significant associations between glaziovii alleles and dry matter content or root number. Small, opposing effects on clonal vigor were observed at different ends of the introgression. The region showed significant segregation distortion and enrichment of putative deleterious alleles. Genome alignment of M. esculenta and M. glaziovii assemblies did not show any major structural variants in the introgression region, suggesting that suppressed recombination is likely driven by sequence-level divergence rather than structural rearrangements. These results indicate that the glaziovii introgression does not directly contribute to dry matter, supporting the need for recombination and purging of the glaziovii introgression to aid cassava improvement. Plain language summaryA large chromosome segment from a wild relative of cassava is an important structural aspect in the cassava genome. Since the chromosome segment tends to be inherited as one block, its effects on cassava traits were not well resolved. Through genetic mapping at higher resolution, we identified that the wild segment impacts early vigor and does not appear to impact dry yield, as was previously thought. While there are no major structural differences between the wild and cultivated chromosome segments, their overall divergence seems to suppress the wild chromosome segment from pairing with the cultivated chromosome segment during reproduction. In the apparent absence of any major benefits from the wild segment, removing it from the breeding population may be beneficial. Core ideasO_LIA set of glaziovii allele-specific markers were designed to track the chromosome 1 introgression haplotype. C_LIO_LISegregation distortion suggests the presence of recessive deleterious or lethal alleles in the introgression. C_LIO_LIIncreased recombination is needed to purge deleterious alleles enriched in introgression region. C_LIO_LIThe glaziovii introgression was associated with slightly lower vigor rating and stem diameter. C_LIO_LIThe effects of the previously-identified glaziovii DM QTL were not detected in this population. C_LI

15
Bayesian AMMI-Based Simulation of Genotype x Environment Interactions

Lee, H.; Segae, V. S.; Garcia-Abadillo, J.; de Oliveira Bussiman, F.; Trujano Chavez, M. Z.; Hidalgo, J.; Jarquin, D.

2026-03-15 bioinformatics 10.64898/2026.03.11.711188 medRxiv
Top 0.1%
6.9%
Show abstract

Genotype-by-environment interaction (GEI) has been studied to identify environment-stable/favorable genotypes. The GEI simulation could help refine the inference by incorporating tangible factors such as genomic and environmental information. The Bayesian additive main effect and multiplicative interaction (Bayesian AMMI) model captures the genotype-specific responses across environments, reflecting directional relationships between genotypes and environments. Thus, we propose a Bayesian AMMI-based GEI simulation framework that utilizes high-throughput environmental covariance matrices to generate GEI effects with interpretable directional structure. To demonstrate the proposed approach, two simulated phenotypes were assessed under four levels of GEI variance. In the first simulation (Sim1), GEI effects were sampled from a multivariate normal distribution defined by the GEI matrix. In the second simulation (Sim2), GEI effects were generated by extending Sim1 with the Bayesian AMMI model. In both simulations, increasing GEI variance resulted in lower correlations of phenotypes across environments and stronger genotype-specific sensitivity to environmental variation. Across five cross-validation designs, models accounting for GEI consistently outperformed one that did not, with prediction accuracy generally decreasing as GEI variance increased. Clear distinctions between the two simulated phenotypes were evident from biplot analyses: Sim2 successfully captured environmental relatedness and genotype-specific responses, whereas such structure was absent in Sim1. These results demonstrate that the proposed Bayesian AMMI-based GEI simulation framework enables interpretable visualization of GEI and supports genomic selection strategies under complex environmental conditions.

16
Genetic basis of Cassava (Manihot esculenta Crantz) plant architecture and its relevance for selection of farmer-preferred varieties

Okoma, P. M.; Kayondo, S. S.; Rabbi, I. Y.; Amaefula, C.; de Andrade, L. R. B.; Jiwuba, L. C.; Onyeka, J.; Egesi, C. N.; Jannink, J.-L.

2026-02-12 genetics 10.64898/2026.02.11.705251 medRxiv
Top 0.1%
6.6%
Show abstract

Plant architecture, the spatial configuration of stems, branches, leaves, and inflorescences underpins essential physiological functions such as light capture, assimilate partitioning, flowering, and ultimately, yield. In cassava (Manihot esculenta), architectural traits such us plant height, branching level, and plant shape are agronomically important yet remain underexploited in breeding. Here, a large-scale analysis was conducted using phenotypic and genomic data from more than 14,000 cassava accessions evaluated across 34 field locations in Nigeria between 2010 and 2021, encompassing the national breeding programs of the National Root Crops Research Institute and the International Institute of Tropical Agriculture. The study aimed to dissect the genetic architecture, environmental stability, and breeding relevance of four key traits: plant full height, height to first branching, the branching level number (BranchlevelNum) and plant shape. Phenotypic analyses across breeding stages revealed consistent variation in plant height, branching height, and branching intensity, reflecting the cumulative effects of selection and evaluation across environments. Broad-sense heritability estimates ranged from 0.41 to 0.72, with BranchlevelNum and Cylindrical shape exhibiting strong genetic control and weak correlations with yield components, indicating their suitability for independent improvement. Genome-wide association analyses identified significant loci associated with BranchlevelNum, including a major region on chromosome 2 and an additional locus on chromosome 13, collectively explaining approximately 11% of the phenotypic variance. Candidate genes within these regions included regulators of meristem activity and hormone-related pathways, supporting a developmental basis for branching variation. Genomic prediction accuracy for BranchlevelNum reached 0.44, comparable to values reported for key agronomic traits in cassava. These results demonstrate that branching-related architectural traits are genetically tractable, largely independent of yield, and amenable to genomic selection. The findings support the integration of BranchlevelNum and plant shape into ideotype-driven breeding frameworks aimed at improving flowering efficiency, canopy structure, and field performance in cassava. Author SummaryCassava is a major food crop, and its plant shape plays an important role in how easily it can be grown, harvested, and improved through breeding. Traits such as plant height, branching, and canopy form affect flowering, seed production, and field management, yet they have received much less attention than yield or disease resistance. In this study, we examined plant architecture using field and genetic data from more than 14,000 cassava plants grown across Nigeria over twelve years. We focused on key traits describing plant height, branching level, and overall plant shape. We found that branching level is strongly controlled by genetics, remains stable across environments, and can be predicted accurately using genomic data. We also identified specific regions of the cassava genome linked to branching behavior. Our findings show that plant architecture can be improved using modern breeding tools without compromising yield. Incorporating branching traits into breeding programs can help develop cassava varieties that flower more reliably and perform better in farmers fields.

17
Uncovering genetic mechanisms underlying trait variation in switchgrass using explainable artificial intelligence

Izquierdo, P.; Weng, X.; Juenger, T.; Bonnette, J. E.; Yoshinaga, Y.; Daum, C.; Lipzen, A.; Barry, K.; Blow, M. J.; Lehti-Shiu, M. D.; Lowry, D.; Shiu, S.-H.

2026-03-09 genetics 10.64898/2026.03.06.710154 medRxiv
Top 0.1%
6.6%
Show abstract

Uncovering the genetic architecture of quantitative traits is challenging because polygenic control yields small individual gene effects and because gene-gene and genotype-by-environment interactions add further complexity. To understand the genetic basis of polygenic traits and their plasticity across environments, we integrated genome-wide SNPs and RNA-seq transcript data with interpretable statistical and machine learning models in a switchgrass (Panicum virgatum) diversity panel grown at contrasting field sites in Michigan and Texas. Notably, in addition to single environments, our trait prediction models were able to predict phenotypic differences, across environments i.e., plasticity. By interpreting trait prediction models with explainable artificial intelligence methods, we identified important features--genes that are the most predictive of flowering time and annual biomass production across environments, based on their associated gene expression levels and nearby SNPs. This approach recovered canonical flowering regulators and revealed novel, environment-specific candidate flowering genes. Further, transcriptome models consistently recovered more switchgrass genes homologous to experimentally validated genes in Arabidopsis and rice than SNP-based models. Feature interaction scores from the models also allow the identification of trait- and environment-dependent gene-gene interactions, where flowering time showed stronger and more abundant interactions than biomass. While some of the interactions identified are consistent with the link between flowering time and yield, most are novel predictors that need to be further evaluated. Together, these results demonstrate that interpretable genomic prediction with explainable artificial intelligence approaches can convert trait prediction models into mechanistic hypotheses about putative causal genes and interactions controlling traits within and across environments. These results will help to prioritize target genes for validation and inform germplasm selection for cultivar improvement.

18
What makes a banana false? How the genome of Ethiopian orphan staple Ensete ventricosum differs from the banana A and B sub-genomes

Muzemil, S.; Paul, P.; Baxter, L.; Dominguez-Ferreras, A.; Sahu, S. K.; Van Deynze, A.; Mai, G.; Yemataw, Z.; Tesfaye, K.; Ntoukakis, V.; Studholme, D. J.; Grant, M.

2026-02-23 bioinformatics 10.64898/2026.02.21.706659 medRxiv
Top 0.1%
6.5%
Show abstract

BackgroundEnsete ventricosum, also known as the "tree against hunger" plays a key role in Ethiopian food security and farming systems, feeding more than 20 million people. Since domestication via clonal selection in the south-west Ethiopian highlands, todays diverse enset landraces contribute multiple benefits including food, fibre by-product, animal bedding and cattle fodder to farmers and local communities. Improved genomic resources for this highly drought-tolerant plant are essential to supplement the conventional clonal selection-based breeding programme and pave the way towards targeted breeding. ResultsWe sequenced the genome of enset landrace Mazia, which is partially resistant/tolerant to Xanthomonas wilt and predicted 38,940 protein-coding genes. The Mazia assembly (540.14 Mb) is more complete than the previously published genome assembly of landrace Bedadeti (451.28 Mb) and displayed 1.41% heterozygosity and 64.64% repetitive DNA content. Comparative analyses with the Bedadeti assembly and chromosome-level genome sequences of the two main banana progenitors (Musa acuminata, AA genome; Musa balbisiana, BB genome) unexpectedly revealed [~]25% of the Mazia genome is unique to enset. Gene Ontology (GO) and sequence similarity search analysis of enset-specific protein-coding genes identified distinct functional signatures that underpin the lifestyle, adaptation, and corm productive quality of enset, including functions related to DNA integration, carbohydrate metabolism, disease resistance and transcriptional regulation. In contrast, Musa-specific genes showed enrichment for defence response, protein phosphorylation and fruit development pathways. Focusing on the classical nucleotide binding site leucine rich repeat (NLR) disease resistance genes, we identified and characterised NLRs in enset and Musa species genomes, revealing a considerable expansion in the Musa acuminata genome. We also identified unique genes in enset and banana genomes whose functional and evolutionary roles are yet to be determined. ConclusionsHere, we report a de novo genome assembly for the enset (Ensete ventricosum) landrace Mazia and provide a high-quality annotation of both Mazia and the previously published assembly of the landrace Bedadeti. Collectively, these genomic resources provide a valuable foundation for comparative genomics within the Musaceae family and open new opportunities for the development of marker-assisted breeding strategies to accelerate the improvement of agronomically important traits in enset.

19
A Bayesian multidimensional approach to decipher the genetic basis of dynamic phenotypes in multiple species

Blois, L.; Heuclin, B.; Bernard, A.; Denis, M.; Dirlewanger, E.; Foulongne-Oriol, M.; Marullo, P.; Peltier, E.; Quero-Garcia, J.; Marguerit, E.; Gion, J.-M.

2026-04-03 genetics 10.64898/2026.04.01.715770 medRxiv
Top 0.1%
6.5%
Show abstract

Deciphering the genetic architecture of complex quantitative phenotypes remains challenging in quantitative genetics. These traits not only depend of multiple genetic factors but are also established over time and environments. Although quantitative genetics has investigated the genetic determinism of phenotypic plasticity in contrasted environmental conditions, the time related phenotypic plasticity has received less attention. Here we proposed a multivariate Bayesian framework, the Bayesian Varying Coefficient Model, designed for analysing the genetic architecture of the time related phenotypic plasticity by a multilocus approach. We applied the BVCM to time series phenotypes measured at various time scales (daily, monthly, yearly) across a diverse set of biological species. We included in this study: yeast (Saccharomyces cerevisiae), fungi (Fusarium graminearum), eucalyptus (Eucalyptus urophylla x E. grandis), and sweet cherry tree (Prunus avium). The BVCM results were compared with those obtained with a known genome-wide association method carried out time by time. For all species and traits, the BVCM was able to detect the major QTL identified by marker-trait association methods and revealed additional genetic regions of weak effect. It also increased the phenotypic variance explained for most of the phenotypes considered. It revealed dynamic QTLs with transitory, increasing or decreasing effects over time. By considering both the temporal and genetic multivariate structures in a single statistical model, we increased our understanding of the genetic architecture of complex traits notably by reducing the issue of missing heritability. More broadly, this work raises the foundation for extended applications in functional genomics, evolutionary ecology, and crop breeding programs, in which time-related phenotypic plasticity remains crucial for predicting and selecting key quantitative complex traits. Key messageBy capturing the genetic factors influencing the time related phenotypic plasticity, our approach contributes to a deeper understanding of the dynamic nature of genotype-phenotype relationships.

20
Genomic and pedigree-based approaches to predict parental breeding values for nut and kernel traits in almond (Prunus dulcis Mill. D. A. Webb)

Goonetilleke, S.; Wilkinson, M. J.; Wirthensohn, M. G.; Collins, C.; Furtado, A.; Henry, R. J.; Hardner, C.

2026-01-24 genetics 10.64898/2026.01.22.701136 medRxiv
Top 0.1%
6.4%
Show abstract

The self-incompatibility, perennial growth habit, large tree size, and long juvenility present challenges in applying traditional breeding approaches in almond (Prunus dulcis Mill. D. A. Webb). Moreover, nut and kernel traits in almond are mainly controlled by a large number of small-effect quantitative trait loci (QTLs) and improving complex traits through conventional breeding approaches is slow and often inefficient. Genome-wide selection represents a promising strategy to enhance the efficiency of cultivar identification and selection of superior parents in almond breeding programs by estimating the breeding values (BVs) at early maturity. The main aim of this study was to implement genomic (GBLUP) and pedigree-based (ABLUP) prediction approaches to estimate BVs to identify the superior parental candidates for improving nut and kernel traits in almond. Here, we estimated BVs for nine traits that are commonly used in the primary evaluation stage of the almond breeding using genomic data from 61 parents and phenotypic data of 15,281 progeny derived from 205 unique families. Breeding values obtained from both approaches showed a strong correlation (r [≥] 0.94) for all traits except shell seal (r = 0.87). The population structure analysis conducted using high-quality 90K single nucleotide polymorphisms (SNPs) indicated clear separation of the Californian, European and some old Australian almond cultivars, with considerable admixture across some cultivars. Following further validation, both prediction approaches could be useful in early identification of superior candidates. The slightly higher breeding values obtained using the GBLUP compared to the ABLUP approach suggest that accounting for within-family variations and realised genomic relationships can enhance prediction accuracy, reliability, and overall genomic prediction performance in almond.